Dynamicity vs. Effectiveness: A User Study of a Clustering Algorithm for Scatter/Gather

نویسندگان

  • Weimao Ke
  • Cassidy R. Sugimoto
  • Javed Mostafa
چکیده

We proposed and implemented a novel clustering algorithm called LAIR2, which has linear worst-case time complexity and constant running time average for on-the-fly Scatter/Gather browsing [4]. Our previous experiments showed that when running on a single processor, the LAIR2 on-line clustering algorithm was several hundred times faster than the parallel Buckshot algorithm running on multiple processors [11]. This paper reports on a study that examined the effectiveness of the LAIR2 algorithm in terms of clustering quality and its impact on retrieval performance. We conducted a user study on 24 subjects to evaluate on-the-fly LAIR2 clustering in Scatter/Gather search tasks by comparing its performance to the Buckshot algorithm, a classic method for Scatter/Gather browsing [4]. Results showed significant differences in terms of subjective perceptions of clustering quality. Subjects perceived that the LAIR2 algorithm produced significantly better quality clusters than the Buckshot method did. Subjects felt that it took less effort to complete the tasks with the LAIR2 system, which was more effective in helping them in the tasks. Interesting patterns also emerged from the subjects’ comments in the final open-ended questionnaire. We discuss the implications and future research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel and Distributed Scatter-Gather Clustering System Development Proposal

From the process of scatter-gather algorithm explained above, we can easily find the essence of the parallel version of this algorithm is the parallel clustering algorithm used in the scatter phase. Frieder, et al. implements a parallel version of the buckshot clustering algorithm [1]. Their work meets the need of the parallel scatter-gather clustering algorithm pretty well, although we can des...

متن کامل

Repeated Record Ordering for Constrained Size Clustering

One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...

متن کامل

Hybrid ANFIS with ant colony optimization algorithm for prediction of shear wave velocity from a carbonate reservoir in Iran

Shear wave velocity (Vs) data are key information for petrophysical, geophysical and geomechanical studies. Although compressional wave velocity (Vp) measurements exist in almost all wells, shear wave velocity is not recorded for most of elderly wells due to lack of technologic tools. Furthermore, measurement of shear wave velocity is to some extent costly. This study proposes a novel methodolo...

متن کامل

A Fast Online Clustering Algorithm for Scatter/Gather Browsing

We present a fast online clustering algorithm which has linear worst-case time complexity and constant running time average for the well-known online visually oriented browsing modeling called Scatter/Gather browsing (Cutting, Karger, Pedersen, and Tukey 1992). Our experiment shows when running on a single processor, this fast online clustering algorithm is few hundred times faster than the par...

متن کامل

Evolutionary User Clustering Based on Time-Aware Interest Changes in the Recommender System

The plenty of data on the Internet has created problems for users and has caused confusion in finding the proper information. Also, users' tastes and preferences change over time. Recommender systems can help users find useful information. Due to changing interests, systems must be able to evolve. In order to solve this problem, users are clustered that determine the most desirable users, it pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008